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Abstract 

Consider a sequence {X n }^ ( L 1 of i.i.d. uniform random variables 
taking values in the alphabet set {1, 2, . . . , d}. A k-superpattern is a 
realization of {X n Y n=1 that contains, as an embedded subsequence, 
each of the non-order- isomorphic subpatterns of length k. We focus 
on the (non-trivial!) case of d = k = 3 and study the waiting time 
distribution of t = inf{t > 7 : {X n Y n=1 is a superpattern}. 

1 Introduction 

A string of integers with values from the set { 1 , 2, . . . , d} (equivalently, a word 
on the (i-letter alphabet) is said to contain a pattern if any order-is omorphic 
subsequence of that pattern can be found within that word. For example, 
the word 5371473 contains the subsequences 571, 574, and 473, each of which 
is order-isomorphic to the string 231. We call the string 231 the pattern that 
is contained in the word since it is comprised of the lowest possible ordinal 
numbers that are order isomorphic to any of these three sequences. In the 
literature, the term pattern is often reserved for strings of characters in which 
each character is unique. This traditional definition of pattern is adhered to 
in this paper, while the term preferential arrangement denotes those strings 
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of characters in which repeated characters are allowed, but not necessary. 
The word 5371473 in the previous example also contains the subsequences 
373 and 343 which are both order-isomorphic to the string 121; thus both the 
string 121 and the string 231 are preferential arrangements contained in the 
parent string. This order isomorphism on the preferential arrangements is 
equivalent to a dense ranking system, where items that are equal receive the 
same ranking number, and the next highest item(s) receive the next highest 
ranking number. The number of preferential arrangements of length n on n 
symbols is given by the sequence of ordered Bell numbers, whose first few 
elements are 1,3, 13, 75, . . .; see, e.g., [15]. 

The systematic study of pattern containment was first proposed by Herb 
Wilf in his 1992 address to the SIAM meeting on Discrete Mathematics. 
However, most results on pattern containment deal more directly with pattern 
avoidance, specifically the enumeration and characterization of strings which 
avoid a given pattern or set of patterns. The first results in this area are due 
to Knuth [12]. For example, if n G S n is a random permutation (not word) 
then the probability that it avoids the pattern 123 is given by where 



C„ = ^tt are the Catalan numbers. The number of 132, 231, 213, 312, and 

rt n+l 

321-avoiding permutations are also given by the Catalan numbers, which by 
Stirling's approximation are asymptotic to K ■ for some constant K. The 
Stanley- Wilf conjecture, namely that the number of permutations that avoid 
a fixed fc-pattern is asymptotic to C n for some constant < C < oo, was 
proved in [T3] . 

Of the few results available on pattern containment, most deal with speci- 
fied sets of patterns contained in fixed length permutations, i.e. strings with- 
out repeated letters; here we cite the work of in [2], [1], [7], [9], [H]. Research 
in this area mainly includes enumerating maximum occurrences of a given set 
of patterns ("packings"), which may only include one pattern, contained in a 
permutation of fixed length. Burstein et al. [6] have expanded this research 
further by not only removing the permutation requirement, thereby allowing 
for repeated letters in the word that is to contain the set of patterns, but also 
allowing repeated letters within the patterns themselves. This work, and the 
references therein, seem to be closest in spirit to the work undertaken in the 
present paper. We are specifically interested in the problem in [6] regarding 
the word length required for a word to contain all preferential arrangements 
of a given length. We define a superpattern, to be a word which contains all 
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preferential arrangements of a given length. Given k,d G Z + , let n(k,d) be 
the smallest string that contains all preferential arrangements of length k on 
an alphabet of size d. Since n(k, d) = n(k, k) for d > k, it suffices to consider 
the case d < k. The authors of [6] prove the following results: 

Lemma 1.1. n(2, 2) = 3 and for any d > 3, n(d, d) < d 2 — 2d + 4. 

Lemma 1.2. For any k > d > 3, n(k, d) < (k — 2)d + 4. 

They also conjecture that for all d > 3, n(d, d) = d 2 — 2d + 4, which they 
argue is a very hard open problem. 

In this paper, we tackle the following random version of the extremal 
work mentioned in the previous paragraph: Consider a sequence {X n }^ =1 of 
i.i.d. uniform random variables taking on values in the alphabet set 

{1, 2, . . . , d}. A k-superpattern is a realization of {X n Y n=l that contains, as 
an embedded subsequence, each of the preferential arrangements of length 
k. After disposing off the case of d = k = 2 in Section 2, we focus on the 
(non-trivial!) case of d = k = 3 in Section 3, and study the waiting time 
distribution of r = inf{£ > 7 : {X n y n=1 is a superpattern}. Here the infimum 
is taken over t > 7 in light of Lemmas 3.2 and 3.3 below. As pointed out in Fu 
[TO] , such problems are hard even for small k; there he studies the number of 
occurrences of the pattern 123 in a random permutation. Another probability 
distribution that is in the spirit of the work undertaken here can be found 
in [8], where the authors study the distribution of the first occurrence of a 
3-ascending pattern. It would be interesting, moreover, to see if the Markov 
chain embedding method (Fu and Koutras [TTJ, Balakrishnan and Koutras 
[3]) can be used to good effect to make further progress in this area. 

We end this section with some analogies drawn from [1J. If, instead of 
considering preferential arangements, we ask for the waiting time W until 
every word of length k over a rf-letter alphabet is seen, then the problem 
becomes both easier, in the sense that E(W) and V(W) can be easily com- 
puted, but elusive as far as the exact waiting time distribution is concerned. 
It is shown in [JJ that the distribution of W is the same as that of the waiting 
time until k disjoint coupon collections from the coupon set {1,2, ... ,d} are 
obtained. Further analyses and limit theorems are given in that paper. 
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2 Binary Alphabet 



Some further classification of superpatterns is necessary for clarity in this pa- 
per. Let a minimal superpattern be a superpattern in which no two adjacent 
letters are the same. A minimum superpattern is a minimal superpattern 
of the shortest length possible, i.e., one in which every letter is necessary 
for the containment of all preferential arrangements. Let a strict superpat- 
tern be a superpattern in which the last letter of the superpattern is needed 
to complete one of the preferential arrangements contained in the superpat- 
tern. Clearly, all minimum superpatterns are strict superpatterns, but not 
conversely. Specifically, a strict superpattern may contain extraneous repeat 
letters; e.g., for k — d — 2, 121 is a minimum superpattern, but 111221 is a 
strict non-minimum superpattern. 

In the binary case, a superpattern is a word that contains all the prefer- 
ential arrangements, namely 11, 12, and 21. The waiting time r for a binary 
string to be a superpattern satisfies: r = n iff there exist precisely two runs 
among the first n — 1 letters of the word and the nth letter must be the letter 
that correctly completes a minimum superpattern. The number of ways to 
partition n — 1 letters into 2 non-empty parts is n — 2. Since there are a total 
of 2 minimum superpatterns, namely 121 and 212, there are 2{n — 2) words 
of length n that satisfy the required conditions. Therefore the probability 
that a word on n letters contains all preferential arrangements for k = d = 2 
is 

2(n-2) n-2 

P(r = n)= P%n) = 



2 n 



It follows that 



^/ \ \ -* n(n — 2) 



2* 

n>3 

1 \ ^ n{n — 1) x - n 

2 Z / 2 n ~ 2 4 2 n ~ l 

n>3 n>3 

= 1(16-2) -(4- 1-1) 

= 5, (1) 

in contrast to the fact that the waiting time for all words of length 2 to appear 
as subsequences is the waiting time for two disjoint coupon collections of two 
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toys," which equals 3+3 




V(t) 



tE 

n>3 




- ■ 96 + 5 - 25 
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and the (rational) generating function is 



G 2 (t) 



E 



- 2) 



n>3 



(2-t) 2 ' 



3 Ternary Alphabet 

The sitation becomes vastly more complicated when d = k = 3. By way 
of comparison, we note that the expected waiting time for a single coupon 
collection, i.e., until one of each of the three letters of the alphabet is seen, is 
1+1.5+3=5.5, so that the expected waiting time till each of the 27 ternary 
words is seen as a subsequence is 3 ■ 5.5 = 16.5. How much less do we 
expect to have to wait till the string becomes a superpattern that contains 
each of the 13 preferential arrangements of three-letter words on a ternary 
alphabet, namely 111, 112, 121, 211, 122, 212, 221, 123, 132, 213, 231, 312, 
and 321, as subsequences? Throughout the rest of the paper, we will refer 
to superpatterns in the context of this section as superpatterns for [3] 3 , and 
denote the length of the superpattern by n = n(3, 3). Following the notation 
of [5], let 7r = 7Tx, 7T2, . . . , 7Tfc be a partition of [n], and 7Tj denotes a block of 7r. 
Then a = (ai, ct2, . . . , a&) is a partition of the integer n where a, = [-7T» ] and 
&i > 0-2 > ■ ■ • > flfc- For example, if n — 7, k — 3, then one such partition 
of 7 is (5, 1, 1), and we will think of this as corresponding to the number of 
letters of the three types in the superpattern. It should be noted that for any 
minimal superpattern no > |~|~|, since this would cause adjacent letters to 
be the same. This fact combined with the following lemma prove very useful 
in determining the word length of superpatterns for [3] 3 . 
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Lemma 3.1. Any superpattern for [3] 3 contains a jk and a kj pattern (as 
a subsequence) both before and after at least one i, where i,j,k G [3] with 
i^j^k. 

Proof. Let a be a superpattern for [3] 3 and let i, j, k e [3] with i ^ j ^ k. 
Assume o does not contain a jk pattern before an i. Then a does not contain 
the pattern jki and a is not a superpattern for [3] 3 . This is a contradiction 
and therefore a contains a jk pattern before at least one i. The cases for a 
containing a jk pattern after an i, kj pattern before an i, and kj pattern 
after an i follow in a similar manner. □ 

It is clear, since (3) = 10 < 13, that there are no strict minimal super- 
patterns for n = 3, n = 4, or n = 5. Thus the smallest value of n(3, 3) is at 
least 6. 

Lemma 3.2. There are no strict minimal superpatterns of length n = 6. 

Proof. The integer 6 can be partitioned into 3 parts in three ways, namely 
(4,1,1), (3, 2,1), and (2, 2, 2). 

Consider a strict minimal superpattern with (ai, 02, a%) = (4, 1, 1). Then 
there exists an > |~|] = 3, causing two adjacent letters to be the same 
letter, which contradicts the fact that a is a strict minimal superpattern. 
Next, consider a strict minimal superpattern with (01, a 2 ,a 3 ) = (3,2,1), so 
that 03 = 1. Let i, the singleton letter, be the rth letter of the six letter 
string. Then r > 4 since there exists both a jk and a kj pattern before i and 
r < 3 since there exists both a jk and a kj pattern after i. Thus no such 
r exists and therefore there is no strict minimal superpattern with 3, 2, and 
1 letters of the three types. Finally, consider a strict minimal superpattern 
with (ai, ei2, 03) = (2, 2, 2). Then there does not exist an a, > 3 and thus no 
111 pattern exists, which contradicts the fact that we have a strict minimal 
superpattern. □ 

Lemma 3.3. There exist seven strict minimal superpatterns of length n = 7 
up to isomorphism. 

Proof. The integer 7 can be partitioned into 3 parts in four ways, namely 
(5,1,1), (4,2,1), (3, 3,1), and (3, 2, 2). 
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Case 1: Consider a strict minimal superpattern corresponding to a (5, 1, 1) 
partition. Then there exists an > |~^] = 4, causing two adjacent letters 
to be the same, which contradicts the strict minimality of the superpattern. 
This case is thus vacuous. 

Case 2: Consider a strict minimal superpattern with partition structure 
(4, 2, 1) with aj = 1, a,j = 4, and a& = 2. Let the rth letter of the string 
equal i for some r e [7]. Then r > 4 since there exists both a jk and a kj 
pattern before % and r < 4 since there exists both a j/c and a /cj pattern after 
i. Therefore r = 4. Since there are four instances of the letter j, and two of 
the letter k, we see that the first three letters of the string must correspond 
to the last three letters of the string. Therefore (up to isomorphism) there 
exists one such strict minimal superpattern having 4, 2, and 1 occurrences 
of the three letters. Denote this superpattern by 1213121. 

Case 3: Consider a strict minimal superpattern with 3, 3, and 1 occur- 
rences of the letters. Set = l,Oj = 3,afc = 3. Let the rth letter of the 
string be the singleton i. Then r > 4 since there exists both a jk and a kj 
pattern before i, and r < 4 since there exists both a jk and a kj pattern 
after i. Thus r = 4. Since there are 3 instances of each of the letters j and 
k, we see that the first and last three letters of the string must be comprised 
of jkj and kjk respectively. Up to isomorphism, therefore, exists just one 
such strict minimal superpattern with partition structure (3, 3, 1); we denote 
it by 1213212. 

Case 4: The case with partition structure (3,2,2) is the most compli- 
cated case with five non-isomorphic solutions. Consider a strict minimal 
superpattern with a« = 3, a,j = 2, = 2. We focus on the most frequent 
letter. Let the rth, sth and tth letters be of the string be % for some i, with 
l<r<s<t<7. Since no two adjacent letters are the same letter, 
3 < s < 5. 

If s = 3, then r = 1 and t — 5,6, or 7, since no two adjacent letters 
are the same letter. If t — 5, then there does not exist both a jk and a kj 
pattern before at least one i, which contradicts Lemma 3.1. Therefore t ^ 5. 
If t = 6, then we find that Lemma 3.1 is violated no matter in which of the 
six possible ways the two 2's and two 3's are arranged. Thus t ^ 6. If t — 7, 
then once again we see and there does not exist a configuration of the other 
four letters for which Lemma 3.1 is satisfied. Thus t ^ 7. 
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If s = 4, then r = 1 or 2 and t = 6 or 7 since no two adjacent letters are 
the same. If r = 1, t = 6, the only feasible pattern is ijkijik. If r = 1, t = 7, 
there are two solutions, namely ijkijki and ijkikji. If r = 2,t = 6, the 
single solution is jikijik, and, finally, if r = 2, t — 7, the single solution is 
jikijki. 

It can be shown that no additional solutions exist for s = 5. This com- 
pletes the proof. □ 

Corollary 3.4. The length of a minimum superpattern for [3] 3 is n(3, 3) = 7. 

Burstein et al. ([6]) give a constructive proof for n(l, I) < I 2 — 21 + 4 and 
conjecture that n(l, I) = I 2 — 21 + 4. The corollary above characterizes the 
solutions for the case / = 3. The seven unique strict minimal superpatterns of 
length n = 7, up to isomorphism, are 1213121, 1213212, 1231213, 1231231, 
1231321, 1232123, and 1232132. Since there are 3! ways to permute the 
letters isomorphically in each strict minimal superpattern of length n = 7, 
we obtain a total of 3! (7) = 42 strict minimal superpatterns of length n = 7. 
These are also the minimum superpatterns. 

Next, we consider the total number of minimal superpatterns, up to iso- 
morphism, for any any length n > 8. Since all minimal superpatterns are 
comprised of an alternating pattern, then, up to isomorphism, the first two 
letters can be fixed as i and j for i,j e [3] with i ^ j. There exist 2 n ~ 2 
total words on the remaining n — 2 positions that have alternating patterns 
since each letter can be chosen in two ways. However, not all of these 2 n ~ 2 
words will result in a [3] 3 -superpattern of length n. The following lemma 
aids in determining the number of candidate n-strings which fail to create a 
superpattern of [3] 3 ; this number, up to isomorphism, ends up being (n — 2) 2 . 

Lemma 3.5. Any strict minimal n- superpattern for [3] 3 contains a mini- 
mum superpattern for [3] 3 with the last letter of the minimum superpattern 
occurring on the last letter of the superpattern. 

Proof. Consider, up to isomorphism, a strict minimal superpattern a of 
length n for [3] 3 . Let {i,j,k} = [3]. Without loss of generality, let a{n) = i 
and a(n — 1) = k. Then there exists some o~(bi) = i as the first occurrence 
of i in a, and, without loss of generality, there exists {a{ci),a{c 2 )) = (k,j) 
with cr(ci) = k as the first occurrence of k in a, and c(c 2 ) = j as the last 
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occurrence of j in a where b\ < c 2 < n — 1 since there exists both a jk and 
a kj pattern after at least one i. If bi > 3 then there exists a jfc and a fcj 
pattern before it, causing a to contain either a jkjikjk or a kjkikjk pattern, 
both of which are strict superpatterns of length n = 7 and therefore a(n) = i 
is unnecessary for the containment of all preferential arrangements. This 
contradicts the given fact that a is a strict minimal superpattern. Therefore 
h < 3. 

Case 1: If h = 3, then (a(l), a (2)) = jk or kj. If (a(l), a(2)) = jk, then 
a contains the minimum superpattern jkikjki with the last letter of the 
minimum superpattern occurring on the last letter of a. If (cr(l), <x(2)) = kj, 
then a contains the minimum superpattern kjikjki with the last letter of 
the minimum superpattern occurring on the last letter of a. 

Case 2: If b\ = 2,then er(l) = j or k. If <r(l) = j, then there exists the 
pattern ki before cr(c2) = j since there exists a fci pattern before at least one 
j and thus it must also exist before the last j. Then a contains the mini- 
mum superpattern jikijki with the last letter of the minimum superpattern 
occurring on the last letter of a. If <j(1) = k (here C\ — 1), then there exists 
a pattern before a{n — 1) = A; since there exists a ji pattern before at least 
one k and <r(n — 1) is the last occurrence of k. Since no two adjacent letters 
are the same letter, er(3) = j or k. If cr(3) = j, then (noting that there must 
be a A; between the third spot and the C2th) a contains either a kijikji on 
the first n letters, or a kijkijk or kijkjik pattern on the first n — 1 letters. 
In the first case, a contains a minimum superpattern kjikjki with the last 
letter of the minimum superpattern occurring on the last letter of a. In the 
second and third case, we find embedded minimum superpatterns on n — 1 
letters, and therefore <r(n) = i is unnecessary for the the containment of all 
preferential arrangements. This contradicts the given fact that a is a strict 
minimal superpattern. If a (3) = k, then a contains the minimum superpat- 
tern kikjiki with the last letter of the minimum superpattern occurring on 
the last letter of a. This is because there must be an ik and a ki after some 
j- 

Case 3: If bi = 1, then cr(2) = j or k. If cr(2) = j, then there exists a fci 
pattern before cr(c2) = j. Therefore a contains the minimum superpattern 
ijkijki with the last letter of the minimum superpattern occurring on the 
last letter of the string. If <r(2) = k, then <r(3) = i or j. If er(3) = z, note that 
there exists a ji pattern before a(n — 1) = k. Thus a contains the minimum 
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superpattern ikijiki with the last letter of the minimum superpattern occur- 
ring on the last letter of the string. If a (3) = j, note that there exists a ki 
pattern (where <r(2) = k is the k of the pattern) before cr{c2) = j since there 
exists a ki pattern before at least one j, and er(c 2 ) is the last occurrence of 
j. Thus the string contains the minimum superpattern ikjijki with the last 
letter of the minimum superpattern occurring on the last letter of a. 

Since any i, j, k G [3] can be permuted by isomorphisms, all strict minimal 
n-superpatterns for [3] 3 ; n > 8, contain a minimum superpattern with the 
last letter of the minimum superpattern occurring on the last letter of the 
string. This completes the proof. □ 

It now follows that the strict minimal strings that fail to create a su- 
perpattern of [3] 3 do not contain a complete embedding of one of the strict 
minimal superpatterns of length seven (again, for n = 7 these are the same 
as the minimum superpatterns), since by Lemma 3.5 all strict minimal su- 
perpatterns contain a strict minimal superpattern of length seven. All the 
words contain some portion of a strict minimal superpattern of length seven 
up to isomorphism, since the first two letters are fixed as i and j and each 
strict minimal superpattern of length seven can be written in the same man- 
ner. Let an "i-fold progression" count the number of the 2 n ~ 2 words which 
begin with ij and contain the first through the ith letters of a unique strict 
minimal superpattern of length seven, but not the i + 1st letter. Then 2-fold 
progression is guaranteed by the fixed i and j occurring on the first and sec- 
ond positions of each word. The third position must be an i or a k since no 
two adjacent letters are the same letter. Let the strict minimal superpatterns 
of length seven with the first three positions containing the pattern iji be 
called type A patterns, with the strict minimal superpatterns of length seven 
with the first three positions containing the pattern ijk being called type B 
patterns. 

First, consider the strict minimal superpatterns of type A, namely ijikiji 
and ijikjij, where i,j, k G [3] with % ^ j ^ k. A word that satisfies 3-fold 
progression contains the pattern iji on the first three positions, but no k 
afterwards. There is one such word, namely ijijij . . ., which satisfies a 3- fold 
progression. 

For a 4-fold progression to occur, the word must contain the pattern 
iji on the first three positions followed by a A; which has no i or j after 
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it, otherwise a 5-fold progression will occur. There is only one such word, 
namely ijijij . . . k, where the only occurrence of k is at the end of the word. 

There are 2(n— 4) Type A words that exhibit a 5- fold progression, namely 
any word which follows the pattern ijijij . . . kikiki . . . and ijijij . . . kjkjkj . . ., 
where the k can be inserted in any position other than the first, second, third, 
or nth. 

In order for a word to contain a 6-fold progression, it must contain the 
5-fold progression ijijij . . . kikiki . . . followed by a j or ijijij . . . kjkjkj . . . 
pattern followed by an i . This corresponds to all the ways in which two non- 
consecutive choices can be made from n— 3 spots for the k and the sixth letter 
of the progression, so there are 2 ( n ~ 4 ) such words, namely ijijij . . . kikiki . . . jkjkjk . . . 

and ijijij . . . kjkjkj . . . ikikik .... 

Therefore the total count for the number of words which do not contain 
a complete embedding of one of the type A strict minimal superpatterns of 
length seven is 



Next, consider the strict minimal superpatterns of type B, namely ijkijki, 
ijkikji, ijkijik, ijkjijk and ijkjikj, where i, j, k G [3] with i ^ j ^ k. There 
exist no words that satisfy a 3-fold progression since all words containing the 
pattern ijk on the first three positions contain either an i or a j immediately 
afterwards and there exists either the pattern ijki or the pattern ijkj on at 
least one of the strict minimal superpatterns of type B, causing at least a 
4-fold progression to occur. 

For a 4-fold progression to occur, the word must contain either the pattern 
ijki on the first four positions with no j or k afterwards, which is impossible, 
or the pattern ijkj on the first four positions with no i afterwards, other- 
wise a 5-fold progression will occur. There is only one such word, namely 
i jkjkjk .... 

For a 5-fold progression to occur using the pattern ijki as a basis pattern 
on the first four positions, the word must contain either the pattern ijkij on 
the first five position with no i or k afterwards, which is impossible, or the 



1 + 1 + 2(n - 4) + 2 




= n 2 - 7n + 14. 
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pattern ijkik on the first five positions with no j afterwards, otherwise a 6 
fold progression will occur. There is only one such word, namely ijkikiki 
For a 5-fold progression to occur using the pattern ijkj as a basis pattern 
on the first four positions, there is only one possibility, namely ijkjkjk . . . i, 
where the only occurrence of i after position four is at the end of the word. 
Since any other occurrence of i on the (n — 5) remaining positions (other than 
the last position) results in a 6-fold progression, there are n — 5 ways for the 
word to contain a 6-fold progression for each possible letter that can follow i 
using the pattern ijkj as a basis pattern on the first four positions. A 6-fold 
progression is contained in the word if the pattern ijkjij is not followed by 
a k or the pattern ijkjik is not followed by a j. There are 2(n — 5) such 
words. A word can also contain a 6-fold progression using the pattern ijkij 
as a basis pattern on the first five positions if the word contains either the 
pattern ijkiji on the first six positions with no k afterwards or the pattern 
ijkijk on the first six positions with no i afterwards. There exists only one 
such word for each of these cases, namely ijkijijij . . . and ijkijkjkjk . . .. 
Lastly, a word can also contain a 6-fold progression if it contains the pattern 
ijkik on the first five positions followed by a j on one of the n — 5 remaining 
positions that is not followed by an i. There are n — 5 such words, namely 
any word that follows the pattern ijkikiki . . . jkjkjk . . .. Therefore the total 
count for the number of words which do not contain a complete embedding 
of one of the type B strict minimal superpatterns of length seven is 

P B (n) = l + l + l + 2(n-5) + l + l + (n-5) 
= 3n - 10. 

Therefore the total number of words that do not contain a complete 
embedding of one of the strict minimal superpatterns of length seven and 
thus fail to create a superpattern of [3] 3 is 

Ptaua(n) = n 2 - 7n + 14 + 3n - 10 
= (n-2)\ 

making the total number of minimal superpatterns of any length n > 7, up 
to isomorphism, equal to 

T t otai(n) = 2 n - 2 -(n-2) 2 . 

The sequence generated by T to tai (n) existed previously in (15] as entry number 
A024012, but with little context. We have now added the "superpattern 
origin" of the sequence to that OEIS entry. 
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Lemma 3.6. For all n > 7, the total number S^(n) of strict minimal super- 
pattern of length n is given by S^n) = (n — 4) 2 — 2. 

Proof. Up to isomorphism, the number of strict minimal superpattern of 
length n will equal the total number of minimal superpatterns of length n 
minus any non-strict minimal superpatterns of length n. The total number of 
non-strict superpatterns of length n is equal to the total number of minimal 
superpatterns of length n — 1 times 2, since the last letter is unnecessary in a 
non-strict superpattern for the completion of any preferential arrangement of 
[3] 3 , making the word on the first n — 1 letters a valid minimal superpattern 
of length n — 1 and there are 2 choices for the nth letter since no two adjacent 
letters in the word are the same letter. Therefore, 



SM = [2"~ 2 - (n - 2) 2 ] - 2[2 n - 3 - (n - 3) 2 ] 

= (n-4) 2 -2, 

as asserted. The sequence generated by S^n) existed as entry number 
^4008865 in [15J, but with little context. We have added the above origin. □ 

Lemma 3.7. The number S a (n) of strict n- superpatterns in which there 
exist possible occurrences of adjacent repeated letters is given by S a (n) = 

E^ 7 [(m-4) 2 - 2 ](:: 2 2 ). 

Proof. Any strict superpattern of length n in which there exist occurrences 
of two adjacent and repeated letters will contain an embedded occurrence 
of a strict minimal superpattern of length m, where 7 < m < n. Therefore 
all such superpatterns are found by inserting n — m letters which cause two 
adjacent letters to be the same into strict minimal superpatterns of length 
m. 

These insertions can take place anywhere in the word except before the 
last letter since an occurrence of two adjacent letters as the same letter at 
the end of the word contradicts the strictness of the superpattern. Therefore 
there are n — m insertions of identical "balls" into m — 1 possible positions 
and there are ( m_1H ^"^ m ^~ 1 ) — ( n ~o) wa Y s to do this. Since this insertion of 
the appropriate number of repeats can be done for all strict minimal super- 
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patterns of length m, 7 < m < n, 



n-2 

m — 2 



m=7 x 

- s>-< > 2 -<: 2 2 > 

m=7 x ' 

which finishes the proof. □ 



We now state the main result of this paper: 

Theorem 3.8. For alln > 7 the total number of strict superpatterns of length 
n is given by S(n) = 6^™ t=7 [(m — 4) 2 — 2] ("Z^)' an< ^ ^ us probability 
distribution of the waiting time r for all preferential arrangements of [3] 3 to 
occur as a subsequence is 

a n 

P(r = n)=p (3 , n) = -^[(m-4) 2 -2] 

m=7 

Proof. The first part of the proof follows immediately from Lemma 3.7 and 
the fact that there are 6 isomorphic arrangements for any superpattern. The 
second part follows due to the immediate correspondence between a strict 
superpattern and the waiting time, and the fact that each of the 3™ sequences 
are equally likely. This completes the proof. □ 



r~ 2 Y 

\m-2J 



Computation of moments is now routine. We have 
E(r) 



f£i>-< > 2 -<: 2 2 ) 

n=7 m=7 x 7 



6 ^(m 2 - 8m +14)^ 



00 n( n -l) 

\m— 2/ 



3™ 

m=7 n=m 



oo oo (n— 2^ 

TO- 



6^(m 2 -8m+14)^% 2) 



m=7 n=m 

oo 



00 r n _ /ri - 2N ) 



m=7 

14 



m=7 m=7 




oo 



oo 



l=m—l 



E 



CO 



3«-(m-l) 



m—l 



m=7 



13.5625 



and similar computations, not shown in detail, yield the generating function 



4 Open Questions 

The key questions we would like to see resolved are as follows: (i) Can other 
methods, particularly generation function techniques [T7] or the Markov 
chain embedding technique [TO], [H] be used to give alternative proofs of 
our results and lead to generalizations for alphabets of size higher than 3? 
One major complication to note is that a minimum superpattern for [4] 4 of 
length 12 can be constructed using the construction method found in work by 
Burstein et al., but there exist strict superpatterns for [4] 4 of lengths larger 
than 12 which do not contain one of the minimum superpatterns. One such 
example can be constructed using two copies of type A strict superpatterns 
for [3] 3 separated by a 4, i.e., 121312141213121. (ii) For d > 3, can we obtain 
the exact distribution, in a not-too-complicated form, for the waiting time 
till all the words of length k are obtained as subsequences? NOTE: This 
would be the waiting time for the completion of k disjoint non-overlapping 
renewals of coupon collections with d tokens; see pp. 



G 3 (t): 




2f(lQt 2 - 63t + 63) 
(3 - t) 5 (3 - 2tf 
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